Ok Cupid Users Analysis

Introduction

OkCupid is a mobile dating app. It sets itself apart from other dating apps by making use of a pre computed compatibility score, calculated by optional questions the users may choose to answer.

In this dataset, there are 60k records containing structured information such as age, sex, orientation as well as text data from open ended descriptions.

Here is the link of the OkCupid dataset.

Load and Check Dataset

OkCupid Users Sex, Orientation and Status

This chart tells us a bit more about OkCupid users. Unsurprisingly, there is a higher percentage of male users than female users but the ratio is quite fair for a dating app, with 60% of men and 40% of women.

86% of users are straight, which means there is a non-negligeable part of users who are gay or bisexual (14% !).

Obviously, most of the users declare themselves as "single" (yeah, it's a dating app after all) but if we take a closer look at the data, we can notice that an important part of bisexual OkCupid users aren't actually single ! This part of users mostly declare themselves as "seeing someone" or... "available" (?).

How are OkCupid users locations spread around San Francisco Bay ?

Removing users that are out of California

Getting the latitude and longitude for each city

Adding the latitude and longitude to each location

title

Unsurprisingly, most of OkCupid users from the San Francisco area are located in ... San Francisco ! An interesting thing about this map is that the most active cities, after San Francisco, are Berkeley (4212 users) and Oakland (7214 users), which are famous for their universities. There seems to be some activity around Stanford University (close to Palo Alto) as well. Students apparently have a strong interest in dating apps.

Median age of users by San Francisco Bay area

title

OkCupid users are majoritarily young and the median age in most areas is between 25 and 32 years old. Nevertheless, there is a non negligable amount of users beyond the Golden Gate bridge that are in their late 30s/40s !

Median salary of users by San Francisco Bay area

title.png?raw=true)

As we can see on this map, the wealthiest users live in San Francisco and in the Silicon Valley, which makes sense. On the other side of the Bay, users wages are much lower, probably because most of them are students or are just starting their career.

OkCupid users count by age and sex

Most of OKCupid users aren't as young as we may think. Almost 20 000 of them are actually in their 30s, and there are several thousands of users in their 40s or 50s.

OkCupid users main occupation

The most represented category is "other" which doesn't really help us, but we can notice that very few users declare themselves as "unemployed". If we calculate the unemployment rate based on this data, it would be far below the real unemployment rate (but I guess it's not very attractive to mention it on a dating app, right ?). Otherwise, students, engineers and programmers are well represented on this chart, which can seem logical considering the area we're analyzing (the Silicon Valley).

What is the median salary for each job category ?

Managers are those who have the best salary, with a median salary at 100 000 USD/year (!). The clerical/administrative category has the lowest median salary of all categories, which seems to be around 25 000 USD/year. I volontarily excluded students, retired, other, and "rather not say" categories.

Drugs Consumption Rate vs Age

We can notice that the drug consumption tend to decrease as users get older, more especially during the 20s where the rate drops from 30% to 17% (probably due to the transition between the student life and the professionnal world). There's also a surprising pike in the 60s (people enjoying their retirement ?).

Most Used Words/Expressions in OkCupid Users Descriptions

Most Used Words/Expressions in OkCupid Male Users Descriptions (WordCloud)

Most Used Words/Expressions in OkCupid Female Users Descriptions (WordCloud)

We can find a lot of similaries between male and female users descriptions, they both use a vocabulary focused on the ideas of love, sharing life moments and connecting with people. There are almost only "positive" words !